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The inclusion of a macroscopic adaptive threshold is studied for the retrieval dynamics of both layered 
feedforward and fully connected neural network models with synaptic noise. These two types of architec- 
tures require a different method to be solved numerically. In both cases it is shown that, if the threshold 
is chosen appropriately as a function of the cross-talk noise and of the activity of the stored patterns, 
adapting itself automatically in the course of the recall process, an autonomous functioning of the network 
is guaranteed. This self-control mechanism considerably improves the quality of retrieval, in particular the 
storage capacity, the basins of attraction and the mutual information content. 



1. Introduction 

In general pattern recognition problems, informa- 
tion is mostly encoded by a small fraction of bits 
and also in neurophysiological studies the activity 
level of real neurons is found to be low, such that 
any reasonable network model has to allow vari- 
able activity of the neurons. The limit of low activ- 
ity, i.e., sparse coding is then especially interesting. 
Indeed, sparsely coded models have a very large 
storage capacity behaving as l/(alna) for small a, 
where a is the activity (see, e.g., [H El El |4] and 
references therein). However, for low activity the 
basins of attraction might become very small and 
the information content in a single pattern is re- 
duced [4J. Therefore, the necessity for a control 
of the activity of the neurons has been emphasized 
such that the latter stays the same as the activity of 
the stored patterns during the recall process. This 
has led to several discussions imposing external con- 
straints on the dynamics of the network. However, 
the enforcement of such a constraint at every time 
step destroys part of the autonomous functioning of 
the network, i.e., a functioning that has to be inde- 
pendent precisely from such external constraints or 
control mechanisms. To solve this problem, quite 
recently a self-control mechanism has been intro- 
duced in the dynamics of networks for so-called di- 
luted architectures [5] . This self-control mechanism 
introduces a time-dependent threshold in the trans- 



fer function [3 [3] ■ It is determined as a function 
of both the cross-talk noise and the activity of the 
stored patterns in the network, and adapts itself in 
the course of the recall process. It furthermore al- 
lows to reach optimal retrieval performance both in 
the absence and in the presence of synaptic noise 
El [S] . These diluted architectures contain no 
common ancestors nodes, in contrast with feedfor- 
ward architectures. It has then been shown that 
a similar mechanism can be introduced succesfuUy 
for layered feedforward architectures but, without 
synaptic noise 'Q . Also for fully connected neural 
networks, the idea of self-control has been partially 
exploited for three-state neurons TU] . However, due 
to the feedback correlations present in such an ar- 
chitecture, the dynamics had to be solved approxi- 
mately and again, without synaptic noise. 

The purpose of the present work is twofold: to 
generalise this self-control mechanism for layered 
architectures when synaptic noise is allowed, and 
to extend the idea of self-control in fully connected 
networks with exact dynamics and synaptic noise. 
In both cases it can be shown that it leads to a sub- 
stantial improvement of the quality of retrieval, in 
particular the storage capacity, the basins of attrac- 
tion and the mutual information content. 

The rest of the paper is organized as follows. In 
Sections 2 and 3 the layered network is treated. The 
precise formulation of the layered model is given 
in Section 2 and the adaptive threshold dynamics 
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is studied in Section 3. In Sections 4 and 5 the 
fully connected network is studied. The model set- 
up and its exact threshold dynamics is described in 
Section 4, the numerical treatment and results are 
presented in Section 5. Finally, Section 6 contains 
the conclusions. 

2. The layered model 

Consider a neural network composed of binary 
neurons arranged in layers, each layer containing N 
neurons. A neuron can take values ai{t) £ {0,1} 
where t = 1 , . . . , L is the layer index and i = 
1, . . . , N labels the neurons. Each neuron on layer 
t is unidirectionally connected to all neurons on 
layer t + 1. We want to memorize p patterns 
{^f (t)}, i — 1, . . . ,N, ^ = 1, . . . ,p on each layer t, 
taking the values {0, 1}. They are assumed to be in- 
dependent identically distributed random variables 
with respect to i, ii and t, determined by the prob- 
ability distribution 

P(er = aSi^^it) -!) + (!- a)Si^^{t)) (1) 

From this form we find that the expectation value 
and the variance of the patterns are given by 
E[^'^{t)] = £^[Cr(0^] = « • Moreover, no statistical 
correlations occur, in fact for fJ. ly the covariance 
vanishes. 

The state ai{t -I- 1) of neuron i on layer t + 1 
is determined by the state of the neurons on the 
previous layer t according to the stochastic rule 

with cr(t) = (cri(t),(j2(t), . . . ,crjv(0)- The right 
hand side is the logistic function. The "temper- 
ature" T = controls the stochasticity of the 
network dynamics, it measures the synaptic noise 
level nT|. Given the network state cr(t) on layer t, 
the so-called "local field" hi{t) of neuron i on the 
next layer i -|- 1 is given by 

N 

with d{t) the threshold to be specified later. The 
couplings Jij (t) are the synaptic strengths of the 
interaction between neuron j on layer t and neuron 
i on layer t + 1. They depend on the stored patterns 



at different layers according to the covariance rule 
1 ^ 

■^''^'^ = Nail -a) + " '^^^^^'^'^ " ■ 

(4) 

These couplings then permit to store sets of pat- 
terns to be retrieved by the layered network. 

The dynamics of this network is defined as fol- 
lows (see [12] )• Initially the first layer (the input) 
is externally set in some fixed state. In response 
to that, all neurons of the second layer update syn- 
chronously at the next time step, according to the 
stochastic rule and so on. 

At this point we remark that the couplings 
are of infinite range (each neuron interacts with in- 
finitely many others) such that our model allows 
a so-called mean-field theory approximation. This 
essentially means that we focus on the dynamics 
of a single neuron while replacing all the other 
neurons by an average background local field. In 
other words, no fluctuations of the other neurons 
are taken into account. In our case this approxima- 
tion becomes exact because, crudely speaking, hi{t) 
is the sum of very many terms and a central limit 
theorem can be applied [11 . 

It is standard knowledge by now that mean-field 
theory dynamics can be solved exactly for these lay- 
ered architectures (e.g., [HI US]). By exact analytic 
treatment we mean that, given the state of the first 
layer as initial state, the state on layer t that results 
from the dynamics is predicted by recursion formu- 
las. This is essentially due to the fact that the rep- 
resentations of the patterns on different layers are 
chosen independently. Hence, the big advantage is 
that this will allow us to determine the effects from 
self-control in an exact way. 

The relevant parameters describing the solution 
of this dynamics are the main overlap of the state 
of the network and the ^-th pattern, and the neural 
activity of the neurons 

^'^'^ ^ jVa(l-a) £^^-'^'^~"^^"'('^~°^ 

(5) 

1 ^ 

i=l 

In order to measure the retrieval quality of the 
recall process, we use the mutual information func- 
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tion [SI ini [131 [TS] . In general, it measures the av- 
erage amount of information that can be received 
by the user by observing the signal at the output of 
a channel [TBI [T7] . For the recall process of stored 
patterns that we are discussing here, at each layer 
the process can be regarded as a channel with in- 
put (t) and output ai{t) such that this mutual 
information function can be defined as [51 [T^ 

(7) 

where S{ai{t)) and S'(cri(i)|^f (t)) are the entropy 
and the conditional entropy of the output, respec- 
tively 

Siadt)) = -^p(a,(i))lnb(a.W)] (8) 

x\n[p{a,m^m . (9) 

These information entropies are peculiar to the 
probability distributions of the output. The quan- 
tity p(i7i(i)) denotes the probability distribution for 
the neurons at layer t and p{<Ji{t)\£,'j^ (t)) indicates 
the conditional probability that the i-th neuron is 
in a state ai{t) at layer t given that the i-th site 
of the pattern to be retrieved is (t). Hereby, we 
have assumed that the conditional probability of 
all the neurons factorizes, i.e., p{{cri{t)}\{^i{t)}) ~ 
YljP{crj{t)\^j{t)), which is a consequence of the 
mean-field theory character of our model explained 
above. We remark that a similar factorization has 
also been used in Schwenker et al. [IS] . 

The calculation of the different terms in the ex- 
pression ([7]) proceeds as follows. Because of the 
mean-field character of our model the following for- 
mulas hold for every neuron i on each layer t. For- 
mally writing (forgetting about the pattern index 
^i) (O) ^ ((0)^14)4 = E«P(e)E.P(^IOO for an 
arbitrary quantity O the conditional probability 
can be obtained in a rather straightforward way 
by using the complete knowledge about the system: 
iO = a, (a) = q, {{a - a)(C - a)) = M, (1) = 1. 

The result reads 

pHO = [70 + (71 - 7o)f] S{(T - 1) 

+ [l-7o-(7i-7o)e] S{a) (10) 

where 7o = q—aM and 71 = {l—a)M+q, and where 
the M and q are precisely the relevant parameters 



^ for large N. Using the probability distribution 
of the patterns we obtain 

pia)^qS{a-l) + il-q)Sia) . (11) 

Hence the entropy ([5]) and the conditional entropy 
© become 

S{a)= - qlnq-{l-q)Hl-q) (12) 
S{<^\0 = - [7o + (7i-7o)'?]ln[7o + (7i -7o)C] 
- [1 - 70 - (71 - 7o)f] 

X ln[l - 70 - (71 - 7o)e] ■ (13) 

By averaging the conditional entropy over the pat- 
tern f we finally get for the mutual information 
function fTj) for the layered model 

/(a;e) = -qlnq-{l-q)Hl-q) 

+ a[7i ln7i -f (1 - 71) ln(l - 71)] 

+ (1 - a)[7oln7o + (1 - 7o)ln(l - 70)] . 

(14) 



3. Adaptive thresholds in the layered net- 
work 

It is standard knowledge (e.g., [12]) that the syn- 
chronous dynamics for layered architectures can be 
solved exactly following the method based upon a 
signal-to-noise analysis of the local field ^ (e.g., 
[ll[I31[ini[2U] and references therein). Without loss 
of generality we focus on the recall of one pattern, 
say /i = 1, meaning that only M^{t) is macroscopic, 
i.e., of order 1 and the rest of the patterns causes a 
cross-talk noise at each step of the dynamics. 

We suppose that the initial state of the network 
model {(Ti(l)} is a collection of independent iden- 
tically distributed random variables, with average 
and variance given by E[ai{l)] = E[{ai{l))'^] = qo ■ 
We furthermore assume that this state is correlated 
with only one stored pattern, say pattern /i = 1, 
such that Cov(^f (l),CTj(l)) = 5^,1 a(l - a) . 

Then the full recall proces is described by [H [13] 
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M\t + 1) = iy"px(tanh[/3Fi] +tanh[/3i^2]) 

(15) 

q{t + l) = aM^{t + l) 

+ + JvxtSinh[(3F2]^ (16) 
D{t + 1) = Q{t + 1) 

+ ^ |l - a J X'a;tanh^[^i^i] 

-(1-a) J VxtaTih^[PF2]j D{t) (17) 

with 



Fi = {1 - a) M'^{t)-e{t) + ^/aD{t)x (18) 
F2 = ~aM^{t) - 9{t) + ^/aD{t) x (19) 

and a — p/N, Vx is the Gaussian measure 
Vx = dx{2n)-^/^exp{-x'^/2), where Qit) = [(1 - 



2a)q{t) 



■ a 



and where D(t) contains the influence 



of the cross-talk noise caused by the patterns fi > I. 
As mentioned before, 9(t) is an adaptive threshold 
that has to be chosen. 

In the sequel we discuss two different choices and 
both will be compared for networks with synaptic 
noise and various activities. Of course, it is known 
that the quality of the recall process is influenced 
by the cross-talk noise. An idea is then to intro- 
duce a threshold that adapts itself autonomously 
in the course of the recall process and that coun- 
ters, at each layer, the cross-talk noise. This is the 
self-control method proposed in [S]. This has been 
studied for layered neural network models without 
synaptic noise, i.e., at T = 0, where the rule ^ re- 
duces to the deterministic form ai{t + l) ~ Q{hi{t)) 
with Q(x) the Heaviside function taking the value 
{0, 1}. For sparsely coded models, meaning that the 
pattern activity a is very small and tends to zero for 
N large, it has been found 9J that 



9{t)sc = c{a)y/aD{t), c{a) = V-21na (20) 



makes the second term on the r.h.s of Eg. (lie 
at r = 0, asymptotically vanish faster than a 
such that q ^ a. It turns out that the inclusion 
of this self-control threshold considerably improves 
the quality of retrieval, in particular the storage ca- 
pacity, the basins of attraction and the information 
content. 



The second approach chooses a threshold by max- 
imizing the information content, i — al of the net- 
work (recall Eq. P^ ). This function depends on 
Mi(i), q{t), a, a and (3. The evolution of M'^{t) 
and of q{t) ([TS]), (|16p depends on the specific choice 
of the threshold through the local fleld ^ . We con- 
sider a layer independent threshold 9{t) = 9 and 
calculate the value of for fixed a, a, Mq, go 
and (3. The optimal threshold, 9 = 9opt, is then 
the one for which the mutual information function 
is maximal. The latter is non-trivial because it is 
even rather difficult, especially in the limit of sparse 
coding, to choose a threshold interval by hand such 
that i is non-zero. The computational cost will thus 
be larger compared to the one of the self-control ap- 
proach. To illustrate this we plot in Fig. \T\ the in- 
formation content i as a function of 9 without self- 
control or a priori optimization, for a ~ 0.005 and 
different values of a. For every value of a, below 
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Figure 1: The information i = a/ as a function of 
9 for a = 0.005, T — 0.1 and several values of the 
load parameter a — 0.1, 1, 2, 4, 6 (bottom to top) 

its critical value, there is a range for the threshold 
where the information content is different from zero 
and hence, retrieval is possible. This retrieval range 
becomes very small when the storage capacity ap- 
proaches its critical value Uc = 6.4. 

Concerning then the self-control approach, the 
next problem to be posed in analogy with the case 
without synaptic noise is the following one. Can 
one determine a form for the threshold 9{t) such 
that the integral in the second term on the r.h.s 
of Eq. (fTB|) at T 7^ vanishes asymptotically faster 
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than a? 

In contrast with the case at zero temperature 
where due to the simple form of the transfer func- 
tion, this threshold could be determined analyt- 
ically (recall Eq. ([^0]) ). a detailed study of the 
asymptotics of the integral in Eq. pB]) gives no sat- 
isfactory analytic solution. Therefore, we have de- 
signed a systematic numerical procedure through 
the following steps: 

• Choose a small value for the activity a' . 

• Determine through numerical integration the 
threshold 6' such that 

^ Q{x~e)<a' for 9 > 9' 
J (Tv27r 

(21) 

for different values of the variance cr^ = aD{t). 

Determine as a function of T = 1//3, the value 
for e'j. such that for 9 > 9' + 9'j. 



2 / 2 

dx e-y I" 



2C7\/2'K 



[l + twh[l3{x-e)]] < a' (22) 



The second step leads precisely to a threshold hav- 
ing the form of Eq. ((20|) . The third step determin- 
ing the temperature-dependent part 0^ leads to the 
final proposal 
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0tia, T) = ^ -2\n{a)aD{t) - - Ma)T^ . 



(23) 



This dynamical threshold is again a macroscopic 
parameter, thus no average must be taken over the 
microscopic random variables at each step t of the 
recall process. 

We have solved these self-controlled dynamics, 
Eqs. (fT5| -(fT7 | and P5|) . for our model with synap- 
tic noise, in the limit of sparse coding, numerically. 
In particular, we have studied in detail the influ- 
ence of the T-dependent part of the threshold. Of 
course, we are only interested in the retrieval solu- 
tions with Af > (we forget about the index 1) and 
carrying a non-zero information i = al. The im- 
portant features of the solution are illustrated, for 
a typical value of a in Figs. [UH In Fig. [2] we show 
the basin of attraction for the whole retrieval phase 
for the model with threshold ([^0]) (dashed curves) 
compared to the model with the noise-dependent 
threshold ([251) (full curves). We see that there is 




Figure 2: The basin of attraction as a function of 
a for a = 0.005 and T = 0.2,0.15,0.1,0.05 (from 
left to right) with (full lines) and without (dashed 
lines) the T-dependent part in the threshold 



no clear improvement for low T but there is a sub- 
stantial one for higher T. Even near the border of 
critical storage the results are still improved such 
that also the storage capacity itself is larger. 

This is further illustrated in Fig. [3] where we com- 
pare the evolution of the retrieval overlap M{t) 




Figure 3: The evolution of the main overlap M{t) 
for several initial values Mq with T — 0.2, qq = 
a = 0.005, a — 1 for the self-control model 
without (a) and with T-dependent part (b) and for 
the optimal threshold model (c). 

starting from several initial values, Mq, for the 
model without (Fig. [3] (a)) and with (Fig. [3] (b)) 
the T-correction in the threshold and for the opti- 
mal threshold model (Fig. [3] (c)). Here this tem- 
perature correction is absolutely crucial to guar- 
antee retrieval, i.e., M « 1. It really makes the 
difference between retrieval and non-retrieval in 
the model. Furthermore, the model with the self- 
control threshold with noise-correction has even a 
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wider basin of attraction than the model with opti- 
mal threshold. 

In Fig. [3] we plot the information content i as 
a function of the temperature for the self-control 
dynamics with the threshold (|23p (full curves), re- 
spectively (PU)) (dashed curves). We see that a sub- 
stantial improvement of the information content is 
obtained. 
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Figure 4: The information content z = a/ as a 
function of T for several values of the loading a 
and a = 0.005 with (full lines) and without (dashed 
lines) the T-correction in the threshold. 

Finally we show in Fig. [5] a T — a plot for 
a = 0.005 (a) and a = 0.02 (b) with (fuU hne) and 
without (dashed line) noise-correction in the self- 
control threshold and with optimal threshold (dot- 
ted line). These lines indicate two phases of the 
layered model: below the lines our model allows re- 
call, above the lines it does not. For a = 0.005 we 
see that the T-dependent term in the self-control 
threshold leads to a big improvement in the region 
for large noise and small loading and in the region of 
critical loading. For a = 0.02 the results for the self- 
control threshold with and without noise-correction 
and those for the optimal thresholds almost coin- 
cide, but we recall that the calculation with self- 
control is autonomously done by the network and 
less demanding computationally. 

In the next Sections we want to find out whether 
this self-control mechanism also works in the fully 
connected network for which we work out the dy- 
namics in the presence of synaptic noise in an exact 
way. We start by defining the model and describing 
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Figure 5: Phases in the T ~ a plane for a — 0.005 
(a) and a = 0.02 (b) with (full line) and without 
(dashed line) the temperature correction in the self- 
control threshold and with optimal threshold (dot- 
ted hne). 



this dynamics. 

4. Dynamics of the fully connected model 

As before, the network we consider consists of 
N binary neurons G {0, = 1 . . . N but the 
couplings Jij between each pair of neurons ai and 
CTj are now given by the following rule 

^..-E(er-«)(e;-a) (24) 
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The local field is now determined by 



hi{(T,t) 



N 



J,,a,{t)+0iq) (25) 



The threshold is represented by the function 9 and, 
based upon the results obtained in the previous sec- 
tions and in [10| we have chosen this to be a function 
of the mean activity q of the neurons. 

In order to study the dynamics of this model we 
need to define the transition probabilities for going 
from one state of the network to another. For each 
neuron at time t+l, (7i{t+l), we have the following 
stochastic rule (compare ^) 



P{a,{t + l)\(7{t)) = 



where 



cxp(-/3£(g,(t + l)|cr(t)) 
E,exp(-/?e(5|<T(t)) 



(26) 



e(a,(i + l)\a{t)) = ~<j,{t + l)h,{a{t)) (27) 

with the local fields given by and where cr(0) 
at time t = is the known starting configuration. 

The dynamics is then described using the gen- 
erating function analysis, which was introduced in 
PT] to the field of statistical mechanics and, by 
now, is part of many textbooks. The idea of this 
approach to study dynamics [lU [52] is to look at 
the probability to find a certain microscopic path 
in time. The basic tool to study the statistics of 
these paths is the generating functional 



J2 P(T(0),...,^(i))e 
\cr(o)...(T(t) 



(28) 

with P{(t{0), . . . ,cr{t)) the probability to have a 
certain path in phase space 

P(^(0),...,<T(t)) 



= P{a{0))l[W[a{s~l),a{s)] (29) 

s=l 
t N 

= p(^(o))nn^('^'(^)i'^(^-i)) (30) 



the P{a^{s)\(T{s-l)) are given by In ^ the 
average over the patterns ^ has to be taken since 
they are independent identically distributed ran- 
dom variables, determined by the probability dis- 
tribution ID). 

One can find all physical observables by includ- 
ing a time-independent external field 7^ (t) in ()27p in 
order to define a response fuction, and then calcu- 
lating appropriate derivatives of (|28p with respect 
to ipi{s) or -fi{t) letting all ^lji{t)]i — 1, . . . ,N tend 
to zero afterwards. For example we can write the 
main overlap m(s) (as before we focus on the recall 
of one pattern), the correlation function C{s, s') and 
the response function G(s, s') as 



m(s) 



a(l - a)N 



= i lim 



i 

s^z 



— — lim — 



7T (32) 



lim -Y 



iP^o N ^ (5V',(s)<57,(s') 



(33) 



The further calculation is rather technical, and we 
point the interested reader to the literature for more 
details (e.g.,[221[13|)- One obtains an effective single 
neuron local field given by 



1 



a(l — a) 



{m{s)-aq{s)){i-a) + Q{q) 



Ris, s'Ms') + V^riis) (34) 



s'=0 



with 77(5) temporally correlated noise with zero 
mean and correlation matrix D, and the retarded 
self-interaction R which are given by 



D 

R 



(l-G)-i 



(35) 
(36) 



Here VF[cr,x] is the transition probability for going 
from the configuration cr to the configuration t, and 



The final result for the evolution equations of the 
physical observables is given by four self-consistent 
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equations 

m(s) 

C{s,s') 
G{s,s') 



(ct(s))* 

(a(.)a(.')>* 

(3 (^cr{s)\a{s' + 1)- 

(^_^^l3h((T,r].s') 



(37) 
(38) 
(39) 



(40) 



The average over the effective path measure and the 
recahed pattern is given by 



E 



T(t) 



d'nP{'n)P{a I 77)5 



(41) 



with p{^) given by ([T]), dry = J|^, dri{s') and with 
1 



P(r7) 



^det(27rZ)) 



X exp 



i |] ,7(s)i:)-i(.,.')'7(s') I 



P{<T\ri) = (l + m(0)(2a(0)-l)-a(0)) 



I3<7{s}h(s-1) 
_|_ g/3;i(s-l) 



(42) 



(43) 



Remark that the term involving the one-time ob- 
servables in has the form (m — aq). Therefore, 
in the sequel we define the main overlap M as 



M 



1 



-{m-aq) G [-1, 1] 



(44) 



a(l — a) 

The set of equations dSZl), (03), ([M]) and gO]) rep- 
resent an exact dynamical scheme for the evolution 
of the network. 

To solve these equations numerically we use the 
Eisfeller and Opper method ([H])- The algorithm 
these authors propose is an advanced Monte-Carlo 
algorithm. Recalling equation (I4ip this requires 
samples from the correlated noise (for the integrals 
over rj), the neurons (for the sums) and the pattern 
variable Instead of generating the complete vec- 
tors at each timestep, we represent these samples 
by a large population of individual paths, where 
each path consists of t neuron values, t noise values 
and one pattern variable. All the averages (inte- 
grations, sums and traces over probability distribu- 
tions) can then be represented by summations over 



this population of single neuron evolutions. Because 
of causality, we also know that it is possible to cal- 
culate a neuron at time s when we know all the 
variables (neurons, noise, physical observables) at 
previous timesteps. Also, the initial configuration 
at time zero is known. This gives rise to an iter- 
ative scheme allowing us to numerically solve the 
equations at hand. 

The main idea then is to represent the average 
(|¥T|) over the statistics of the single particle prob- 
lem, as an average over the population of single neu- 
ron evolutions. Since we did not find an explicit 
algorithm in the literature we think that it is very 
useful to write one down explicitly. 

• Choose a large number K , the number of inde- 
pendent neuron evolutions in the population, a 
final time tj, an activity a, a pattern loading 
a, and an initial condition (an initial overlap, 
correlation, activity, ...). 

• Generate space for K neuron evolutions pi. 
Each evolution contains a pattern variable G 
{0,1}, tf neuron variables ai{s) € {0,1}, and 
tf noise variables r]i{s) G M, s — 0...tf,i = 
1...K. 

• At time 0, initialize the £,i according to the dis- 
tribution ([T]). Then initialize the neuron vari- 
ables at time zero employing the initial condi- 
tion, e.g.: 

When an initial activity is defined: 

P(a,(0) = 1) = qiO) 
When an initial overlap is defined: 

• The algorithm is recursive. So, at time t we as- 
sume that we know the neuron variables for all 
times s < t, the noise variables for all times 
s < t, and the matrix elements D{s, s') for 
s,s' < t. We want to first calculate the noise 
variables at time t, and then the neuron vari- 
ables at time t + 1. At timestep t this can be 
done as follows 

1. Calculate the physical observables m{t), 
q{t) and C{t,s) = C{s,t), s < t, by sum- 
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ming over the population: 
1 ^ 

"^W = ^fE^'^^^W (45) 

i=l 

?W = ;^E^^W (46) 

i=l 

1 

^(^'*) = ]^E^^(^)^»(*) (47) 

i=l 

2. For s < t calculate the matrix L 

1 ^ 

^(^'*) = ]f E'^»(*)^»(^) (48) 

1=1 

3. Calculate G = a-'^^^LD~'^, where D is 
the known noise correlation matrix from 
the previous timestep. Turn G into a 
square matrix by adding a column of zeros 
to the end. 

4. Calculate R = {1 - G)~'^ and the new 
D = fiCfit 

5. For each site i, calculate a new noise vari- 
able: 

D-\t,t) 

-^E^-^(M)..(^) 

(49) 

where all Q{t) are independently chosen 
from a standard gaussian distribution. 

6. Calculate the effective local field at each 
site: 

h,{t) = M{t){^,~a)+0{q{t)) 

s<t 

(50) 

7. Use this local field to determine the new 
spin value at each site at time t + 1: 

8. li t < tf increase t and go to step 1. Else 
stop. 




)l , \ , 1 , L 

5 10 15 

t 



Figure 6: The evolution of the overlap of the fully 
connected network for several initial overlaps. The 
system parameters are a = 0.06, a = 0.5, T = 0.04 
and 9{q) = 0. 



This algorithm can be easily performed in a par- 
allel way. All individual neuron evolutions are inde- 
pendent of each other, and the only steps that can- 
not be executed in a distributed fashion are steps 
3 and 4. It turns out that these two steps mostly 
take less than 1% of the total calculation time. 

5. Thresholds in the fully connected network 

We have used the algorithm above to check the 
evolution of the overlap. The threshold function 
d{q{t)) appears in the local field (jSO]) . and its effect 
on the evolution of the different physical observables 
can be investigated. 

We take the size of the population of independent 
neuron evolutions K = 10^. Larger population sizes 
can be obtained by making the algorithm parallel, 
but no significant differences are found. 

We first look at the unbiased case (a = 1/2) with- 
out threshold. In fig. [Slwe plot the evolution of the 
overlap M for several initial conditions. When the 
initial overlap Mq is too smal there is no retrieval. 
This critical initial overlap separating a retrieval 
phase from a non-retrieval phase forms the border 
of the basin of attraction. For biased low activity 
networks, it is already known (e.g, [1]) that a con- 
stant threshold (a— 1/2) has to be introduced in the 
local field eq. (|25p in order to guarantee a correct 
functioning of the network. This can easily be seen 
by noting that for a network where only one single 
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Figure 7: Phases in the T — a plane for, from left 
to right, a = 0.5, a = 0.1, a = 0.05 with e{q) = 
a — 0.5. Solid (dashed) lines indicate the results for 
the dynamics (statics). 
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Figure 8: Phases in the T ~ a plane for a — 0.005 
and several thresholds. Solid: 6 = a — 0.5; dashed: 
self-control threshold without T-correction; dotted: 
self-control threshold with T-correction. 



pattern is stored {hi ^ — a) such that the field 
becomes (1 — a) or (—a). And this lies completely 
asymmetric with respect to the symmetric (around 
the point 1/2) transfer function eg. ([26|) . For a — > 
one even finds that the probability that a neuron 
changes its state from zero to one becomes 1/2. 

A T— a plot for several values of the activity with 
d{q) = a — 0.5 is presented in fig. [7] The solid lines 
represent the results from the dynamics obtained 
by initializing the algorithm discussed in section 4 
with an initial overlap Mq = 1, and determining 
the temperature where this overlap has decreased 
below 0.4 after 200 timesteps. For comparison the 
dashed lines show the results from an equilibrium 
statistical mechanics calculation (e.g., [551 HHj). As 
to be expected, both calculations agree. These lines 
indicate two phases of the fully connected model: 
below the lines our model allows recall, above the 
lines it does not. 

The main question we want to address in this Sec- 
tion is whether we can again improve the retrieval 
capacities of this network architecture by introduc- 
ing the self-control threshold We recall that 
the quantity D{t) occurring in this expression con- 
tains the infiuence of the cross-talk noise. From the 
signal-to-noise ratio analysis in [10] and from statis- 
tical neurodynamics arguments ([20j) we know that 
the leading term of D{t) is q{t). Moreover, from a 
biological point of view, it does not seem plausible 
that a network monitors the statistical quantity of 



the cross-talk noise. Therefore, we take D{t) = q{t) 
in the self-control threshold in fully connected net- 
works. 

We have then solved the generating functional 
analysis ([571) - (|^ with the threshold 

eiqit)) - ^~2\n{a)aqit) - ^ \n{a)T^ (52) 

Some typical results are shown in figs. [SlfTUl For 
system parameters comparable with those for the 
layered architecture, fig. [5] clearly shows that the 
self-control threshold without T-correction signifi- 
cantly increases the retrieval region, and the tem- 
perature correction further improves the results for 
a not too small. 

Looking at a fixed T = 0.1 for this case (Fig. 
lU, we furthermore notice that the self-control 
threshold without T-correction again significantly 
increases the basin of attraction. The additional 
temperature correction further increases this basin, 
and even increases the maximal achievable pattern 
loading a. 

For lower temperatures (Fig. fTU|) the self-control 
threshold still increases the basin of attraction for 
larger values of the pattern loading a, but for 
smaller loadings the effect is diminishing. The tem- 
perature correction gives no clear improvement in 
this case. A similar behavior was observed for the 
layered architecture in fig. [H We remark that the 
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Figure 9: The basin of attraction as a function of 
a for a = 0.005 and T = 0.1. Solid: 9 = a - 0.5; 
dashed: self-control threshold without T-correction; 
dotted: self-control threshold with T-correction. 
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Figure 10: The basin of attraction as a function 
of a for a = 0.01 and T = 0.05. Solid: constant 
9 = a — 0.5; dashed: self-control threshold without 
T-correction; dotted: self-control threshold with T- 
correction. 



subtraction of (a — 1 /2) is not necessary when using 
the self-control method. The latter takes this into 
account automatically and the networks operates 
fully autonomously. 

6. Conclusions 

In this work wc have studied the inclusion of 
an adaptive threshold in sparsely coded layered 
and fully connected neural networks with synap- 
tic noise. Wc have presented an analytic form for 
a self-control threshold, allowing an autonomous 
functioning of these networks, and compared it, for 
the layered architecture, with an optimal thresh- 
old obtained by maximizing the mTitual information 
which has to be calculated externally each time one 
of the network parameters (activity, loading, tem- 
perature) is changed. The consequences of this self- 
control mechanism on the quality of the recall pro- 
cess have been studied. 

We find that the basins of attraction of the re- 
trieval solutions as well as the storage capacity 
are enlarged. For some activities the self-control 
threshold even sets the border between retrieval and 
non-retrieval. This confirms the considerable im- 
provement of the quality of recall by self-control, 
also for layered and fully connected network mod- 
els with synaptic noise. 

This allows us to conjecture that self-control 
might be relevant even for dynamical systems in 



general, when trying to improve, e.g., basins of 
attraction. 
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